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Abstract 

Probability theory can be modified in essentially one way while 
maintaining consistency with the basic Bayesian framework. This 
modification results in copies of standard probability theory for real, 
complex or quaternion probabilities. These copies, in turn, allow one 
to derive quantum theory while restoring standard probability the- 
ory in the classical limit. The argument leading to these three copies 
constrain physical theories in the same sense that Cox's original ar- 
guments constrain alternatives to standard probability theory. This 
sequence is presented in some detail with emphasis on questions be- 
yond basic quantum theory where new insights are needed. 



1 Introduction 



If it weren't for the weight of history, it would seem natural to take quantum 
mechanical phenomena as an indication that something has gone wrong with 
probability theory and to attempt to explain such phenomena by modifying 
probability theory itself, rather than by invoking quantum mechanics. It is 
actually easy to take this point of view because probability theory is so tightly 
constrained by Cox's Bayesian arguments that there is only one plausible 
try. Trying this anyway p|, ^ ^, one finds that Cox's arguments work even 
without the assumption that probabilities are real and non-negative and one 
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obtains "exotic" copies of standard probability theory where the probabihties 
may belong to any real associative algebra with unit. With probability theory 
modified, there is no need for the usual "wave-particle duality" and one is 
free to assume, for example, that a particle in is somewhere in at 
each time. Introducing such "state spaces" and assuming that probabilities 
have a square norm, exotic probabilities acquire the power to predict real 
non-negative frequencies and are limited to three algebras: reals, complex 
numbers and quaternions. Given this framework, complex probabilities with 
state spaces R^ or R^ lead to the standard quantum theory in complete 
detail including the Schrodinger equation and "mixed states." Quaternionic 
probabilities lead, on the other hand, to the Dirac theory]^, ^. Although one 
might expect such theories to be ruled out by Bell's arguments, modifying 
probability theory turns out to evade this and similar restrictions 0. Because 
of the simple nature of the state space axioms and the Bayesian nature of the 
exotic probabilities, the familiar semi-paradoxical measurement and observer 
questions from quantum theory do not arise f^. One has a theory which is 
quite substantially simpler than quantum mechanics both conceptually and 
mathematically. 

Although predictions within state spaces like R^ and R^ agree with stan- 
dard quantum mechanics, Srinivasan has realized that one should expect even 
more interesting results in field theory because exotic probability theory can- 
not produce the apparent divergences which are so common in quantum field 
theory. Indeed, he has shown that with his quaternionic probability ver- 
sion of canonical quantization, he gets the correct result for the Lamb shift 
without any renormalization procedure^]. 

This paper is intended as a review of the basic results from references 2-5 
with more detail than is practical in letter sized papers, as a starting point 
for someone interested in this general subject and as an exposition of unan- 
swered questions where further research is needed. The idea that probability 
theory might be altered in some way goes back at least to Dirac 0. For a 
history of this idea, the review by Muckenheim et al. [|10| is a good starting 
point. Related ideas can be found papers by Srinivasan and Sudarshan|^, 
0, I, Gudderjni, FeynmanQ, Tikochinsky[|T3|], Frohnerg, Caticha|15l, 



Steinbergfig, BelinskiifT^], Miller[]T§, Muckenheim [0], Khrennikov|2g] and 
Pitowsky [pl|] . This work is very infiuenced by the Bayesian view of proba- 
bility theory due to Ed Jaynespl, ||, ||, |5l. 
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2 Cox arguments 



In the Bayesian view of probability theory, probabihties begin as real non- 
negative numbers assigned to pairs (a, b) of arbitrary propositions. These 
numbers are meant to indicate, in some sense to be defined, how likely it 
is that proposition b is true given that proposition a is known. Given this 
setup. Cox argued|jl| that if such an assignment of numbers is to be useful as a 
likelihood, it should satisfy a few plausible conditions. He then demonstrates 
(it is not a proof for reasons which will be clear below) that these conditions 
lead unambiguously to the standard Bayesian presentation of probability 
theory. The basic plan is to simply follow Cox's work while dropping the 
assumption that probabilities are real and non-negative. 

Before beginning, there are a couple of technical points which might cause 
confusion. Cox||l| and Jaynesp5[| discuss probability theory without any re- 



striction on propositions. The idea is that probability theory is meant to be 
"the logic of science" and is meant to be treated slightly informally in the 
same sense that ordinary logic is treated slightly informally in mathematics. 
However, for definiteness, and since we will introduce several copies of prob- 
ability theory, we work in a distributive lattice. The other technical point 
is that Cox, Jaynes and my previous papers work in a Boolean lattice as 
opposed to a distributive lattice. It is easier to deal with a plain distributive 
lattice and this makes no difference for the results in references 2-5. 

Consider a set P and a distributive lattice L with "propositions" a,b,c & 
L with minimum element G L and maximal element 1 G L. For a function 
— s>: L X L ^ P to be a useful measure of "likelihood," we expect, following 
Cox|l|], that (a b) and {a /\b c) should determine (a — 6 A c) and 
denote the implied function by * : P x P ^ P. Similarly, if 6 Ac = 0, we also 
expect that (a b) and (a — > c) should determine (a — >• 6 A c) and denote 
this function by + : P x P — > P. Mathematically speaking. Cox's point is 
that the structure of L has implications for * and +. For example, for any 
a, b,c,d & L, we have 

{a ^ b A c A d) = {a ^ b) * {a Ab ^ c A d) = {a ^ b) * [{a Ab ^ c) * {a Ab A c ^ d)] 

(1) 

and using the associativity of A, 

{a ^ b A c A d) = [{a ^ b) * {a A b ^ c)] * {a A b A c d). (2) 
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Letting x — {a ^ b), y — {a Ab ^ c) and z — {a Ab Ac ^ d), we have 



for all such triples {x,y,z). Following Cox, we further assume that * is 
associative in general. 

Similarly, suppose that we have a,b,c & L with b A c — 0. Then (a — > 
6 V c) = (a — > 6) + (a ^ c) = (a — > c) + (a — > 6). We then plausibly assume 
that + is commutative in general. 

One can easily complete this picture checking properties of L to see what 
is correspondingly expected in P. 



Property of L 


Expected property of P 


A is associative 


* is associative 


V is associative 


+ is associative 


A is commutative 




V is cummutativc 


+ is commutative 


A distributes over V 


* distributes both ways over + 


V distributes over A 




is the minimum 


P has an additive identity "0" 


1 is the niaxiniuni 


P has a two sided niultii)licativo identity "1" 



Although the usual [0, 1] C R probabihties satisfy these conditions, they 
are only one possibility. At this stage, any ring will do, even a ring with 
non-commutative multiplication like the quaternions. Actually, the fact that 
wc have to explain interference effects strongly suggests that we will need 
probabilities with an additive inverse. Plausibly also requiring scaling of 
probabilities by real numbers, we assume, at this stage, that the probabilities 
of interest are real associative algebras with unit. Further restrictions are to 
come in section 3. 

3 Predicting frequencies 

The exotic probabilities of the last section seem exotic mainly because we 
are immediately familiar with what, say, P{b\a) = 0.25 means in terms of an 
experiment. On the other hand, what is the predictive meaning of something 
like (a — > 6) = 2-|-3i? To answer this, it is helpful to realize that this problem 
already exists even in standard probability theory. There is nothing in prob- 
ability theory as such that tells us that probability P{b\a) — 0.25 means 25% 




(3) 
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should be expected in the corresponding frequency. This must be deduced 
from additional assumptions. In the standard probability case, one consid- 
ers N copies of the situation where a was known. One then observes that 
the probability that b is true n/N times peaks at 0.25, and for any interval 
containing 0.25, the probability to be outside the interval can be reduced 
as much as one wants by increasing N. Roughly speaking, the frequency 
meaning of standard probabilities is fixed by the additional assumption that 
"probability zero propositions never happen." It may help to notice that, as 



Jaynes points out[^, standard probability theory works equally well on the 
interval [1, oo] rather than [0,1]. In this case, probability 4.0 would predict 
frequency 0.25 and one would be assuming that propositions with probability 
oo never happen. 

In the case of exotics, we cannot proceed quite as simply as in standard 
probability theory since, as will become clear, zero probability propositions 
may sometimes be true anyway. However, we can progress by assuming that 
L contains a special subspace for which the standard arguments will hold. 
Given P-probability (L, — >), let X be a measure space and suppose that 



the free distributive lattice on X x R is a sublattice of L p6|. We'll refer 
to the second component of X x R as "time" and will often denote it as a 
subscript. For A C X, At denotes \/aeA'^t- ^^^^ below that frequency 
predictions follow if we assume that X has properties that one would expect 
of "the state of the system." In particular, we assume that for any time t, 
Xt^Vt = for any x,y & X with x ^ y, meaning that "the system can't be in 
two different states at the same time." Please note the clash of terminology 
with standard quantum theory where "state space" means a Hilbert space 
and not just a measure space. 

Given a state space X, and any fixed time t, we can relate probabilities to 
functions from X to P. For a,b,c & L, let "wave functions" ^a^b : X ^ P 
be defined by 



{a^bAat)= / '^a-.b (4) 

J a 

for all measurable cr C X. Such functions are therefore related by 

^'a^feAc ={a-^b) ^aAb-c (5) 

in general and 

^a^Wc = ^a^b + ^a^c (6) 
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if 6 A c = 0. 

In order to get real non-negative numbers from probabilities, we take P 
to have a square norm || ||: P ^ R°'+ satisfying || p 5' || = || p || || ? || for 
p,q G P. Given this, we will show that, under certain conditions, 

Probt(6|a) = II 11 (7) 

Jx II ^a-^l II 

is a probability in the ordinary sense. When it doesn't cause confusion, we 
will suppress the function name inside integrals as a notational convenience. 
We may, for example, write 

Prob, 6 a = ^ — ^. 8 

II a — > 1 A II 

Note that probabilities like (a — > 6 A c A x^) are typically zero and, of course, 
(a Xt) isn't equal to ^a{x). 

To derive properties of Prob^, note that 

Probt(6 A c|a) = „ ^ (9) 

Jx II « ^ II 



is equal to 

II a ^ 6 II II a A 6 ^ c A II J-^ \\ a A b —>■ Xt 



(10) 



II a — > Xt II II a A 6 ^ Xt II 

and, rearranging and using || a — > 6 || || a A6 ^ Xt || = || a — > 6 Axt ||, we have 
Probt(6 A c\a) = Probt(6|a) Probt(c|a A b) (11) 
as desired. If we also knew that for 6 A c = 0, 

Probt(6 V c\a) = Probt(6|a) + Probt(c|a) (12) 

then we would have a complete standard probability theory and a frequency 
meaning would follow as in the standard argument. However, (12) is true if 
and only if 

/ II *U + *U Ih / II *U II + / II *U II (13) 

Jx Jx Jx 
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which, in a Hilbert space setting, is equivalent to requiring ^1/*^^ and 
be orthogonal. Thus, we've concluded that we can predict frequencies, but 
only for sublattices of L for which (12) holds. This includes the sublattice 
X at any fixed time and the sublattice of propositions associated with a 
Hermitian operator in the Hilbert space case. 

For example, suppose that we have an orthogonal set of functions {^i, 0„} 
in the Hilbert space L'^{X) and suppose that L contains the sublattice B = 
{bi, 62, • • • , K} where bi is the proposition "0^ is the best description of the 
system at time t." S is a sublattice and (12) is satisfied because < 0i, 0j > is 
zero for i ^ j and so Probt on the sublattice B is therefore a probability the- 
ory in the ordinary sense and, for example Probj(6j| Vr=i expected 
frequency that (pj is the best description of the system at time t assuming 
that one of the 0i, 02, • • • 0™ is optimal. 

As another example, consider how we would describe a Stern-Gerlach 
experiment with quaternion probabilities and state space X = R^. At any 
time t while the particle is heading towards the magnet, Xt is a sublattice of 
L and Probj is a standard probability theory and predicts how often various 
subsets of X are occupied. At a time t' when the particle has gone through 
the magnet and either gone up or down, Xf is also a sublattice and Probf is 
also standard and predicts the results of the experiment. However, although 
Xf U Xf/ is a a sublattice of L, we cannot conclude that either Probt or 
Probt/ are standard probabilities because interference terms may prevent (12) 
from being satisfied. This is why exotic probabilities aren't eliminated by 
Bell's inequalities (see section 8). You can also see that this implies that the 
Stern-Gerlach experiment is not a dynamical system. If there was a function 
f : X X such that a particle at Xt always arrives at f{x)t', probabilities on 
XfUXti would be determined by Probt and /. In this sense the Stern-Gerlach 
system is realistic but not deterministic. 

Thus, we have found that exotic probabilities can indeed acquire predic- 
tive power provided we introduce a "state space" within L and a square norm 
on P. Since the square norm property || P || = || P || || 9 || is crucial, we 
conclude that probabilities must be real associative algebras with a square 
norm. There are, however, only are only three such algebras: the reals, the 
complex numbers and the quaternions |^^. This means that particles may 
only be spin or spin 1/2. Since (12) is only prevented by "interference 
terms" we see that, in this sense, "standard probability theory is restored in 
the classical limit." 
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4 More about state spaces 



As pointed out in reference 4, modifying probability theory means that we 
are free to simply assume that if a particle arrives at a point Xf at a detecting 
screen in a two slit experiment, the particle was therefore somewhere in 
at any previous t <t'. In general, we assume that 

Xf = Xf A Xt (14) 

for all X E X, t < t'. This has immediate implications. For t <t' < t" , 

{Xt ^ Xtn) = {Xt ^ Xt, A Xt,,) = {Xt ^ Xt.){Xt, ^ Xt„) (15) 

and if we also assume that probabilities are time invariant in the sense that 
{At Bt>) = {At+r Bt,+r) for any A,B d X, t,t',T e R, then (X^ ^ 
Xt') — e^(*'~*) for some X e P. This implies that {x) = e^(*'-*)(/)(x) 
for time independent (f) : x ^ lai-^t' ~^ '^*')- those used to quantum 
mechanics, this may seem puzzling because, after assuming very little, we 
concluded that "the system is in an energy eigenstate." What if the system 
is, in fact in some other state? If this question occurs to you, remember 
that an exotic probabihty like {Xt — > Af) is only the best estimate that 
At' is true given that Xt is known. If one knows some additional facts F 
about the system, one should instead calculate {Xt A F At'). Thus, our 
wave functions only represent what one knows about a system and can't be 
interpreted as "the state of the system" in any reasonable sense. Different 
observers will have different knowledge about a system and they may also 
describe a single system with different wave functions. This means that if 
an observer does not know all the relevant facts about a system, their wave 
functions may give incorrect predictions. Of course, this is not a failure of 
exotic probability theory any more than it is a failure of ordinary probability 
theory when the usual analysis of a die fails in the case of loaded die. In 
both cases, the theories are successful to the extent that relevant facts are 
known. From the Bayesian view, the particular result above means that if 
one knows only that the system was somewhere in state space at time t, 
then the best description of the system at any later time is one of the energy 
eigenf unctions. 

One last assumption completes what one intuitively means by a "state 
space." Intuitively, if one knows the "state" Xt at time t G R, then any 
previous knowledge should be irrelevant. In this sense, it is natural to assume 

{At A Xf Bt") = {xf Bt"). (16) 
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for any t < t' < t" , A, B C X , x e X . This assumption also has immediate 
consequences. For A. B G X , letting subscripts indicate time ordering and 
using ^l^^fj{x) = "^ai^) A — >■ 6), 

{Ao -^Bn)= / {Ao AXi^Bn)= / {Ao ^ Xi){xi ^ Bn) 

(17) 

and, repeating the same argument, 

{Ao^Bn)= {Ao^Xi){xi^X2)...{Xn-l^ Bn) (18) 

J Xl,X2,...,Xn-l 

for any sequence of intermediate times ti, t2, ■ ■ ■ , tn-i- We can refer to such an 
expression as a "path integral." Note that this expression together with the 
definition of Prob means that "paths interfere if they end at the same point 
in X." This is the exotic probability version of the "which path" principle 
of quantum mechanics. 



5 Definitions 

Before continuing on to physics, let's collect the definitions so far and estab- 
lish some terminology. For the rest of the paper, we assume lattices to be 
distributive and to have minimum and maximum elements denoted "0" and 
"1" respectively. By a "measure space," we always mean a measure space 
with a finite real non-negative measure. 

Fix P = R, C or H. A P-probability is a lattice L together with a 
function L x L ^ P satisfying 

(a ^ 6 A c) = (a ^ 6) (a A 6 ^ c) (19) 

for all a,b,c E L and satisfying 

{a^bVc)^{a^b) + {a^c). (20) 

for all a,b,cEL with bAc — 0. 

Here are a few simple examples. Let L be the lattice {0, 1} and let 
(a —s> 6) be if 6 is the minimum and 1 if 6 is the maximum. This is a 
P-probability. Given a lattice L, let : L ^ P be some function satisfying 
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(}){a A b) = (j){a)(j){h) in general and 0(a V 6) = 0(a) + 0(6) if a A 6 = 0. 
Then {a ^ h) = (f){b) makes (L, ^) into a P-probability. Let L be a totally 
ordered lattice and let (a ^ 6) be 1 if a < 6 and otherwise. This is also 
a P-probability. Given a P-probability (L, and a sublattice M of L, let 
Z be an element of L. We can then define a new P-probability (M, — >;) by 
letting {a b) = {a Al ^ b) for a,b e M. 

Following standard probability theory, we say that propositions a,b ^ L 
are independent if {a A q ^ b) = {q b) for all g G L and this implies 
(g ^ a A 6) = (g — s> a)(g — > 6) as usual. We say that subsets A^BoiL are 
independent if a and b are independent for all a e A and b & B. 

Given a P-probability (L, — ^), we can define the product of independent 
sublattices M and N of L. Letting {M x N,^y^) he defined by 

(m, n) ^ X (?^^', "^0 = {m ^ m) {n — > n') . (21) 

This defines a P-probability, even if P is not commutative. 

Let X be a measure space and let be the free lattice on X x R subject 

to 

XtAyt^Q (22) 

for all 1/ e X, a; 7^ I/, i e R and 

Xf = xt> A Xt (23) 

for X e X and times t <t'. A P-probabihty (L, — >) is said to "have a state 
space X" if J^X is a sublattice of L and if 

(At A Xf ^ Pt") = {xf ^ Pt'O (24) 

for all times t <t' < t" for all subsets A,BcX and for all x E X. 

6 A simple interferometer 

To exercise our ideas so far, let's analyze the interferometer shown in figure 
1 in some detail. Although one is instinctively shy at first, we are free to use 
simple language to describe what happens as if the particle was a marble. 
Working within a C-probability with state space X = R^, we can say that 
a particle hits Si and either goes on the Pi branch or the P2 branch. After 
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Figure 1: A simple interferometer where a particle enters as indicated and 
encounters a beam splitter (Si), a mirror (Mi or M2) and a second beam 
splitter (S'2) ending up either in detector (Di) or (-D2). 
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hitting either mirror Mi or M2, the particle is on the Qi or the Q2 branch 
respectively. The particle will hit S2 and will end up in either detector Di or 
in detector D2. Experimentally, one surprisingly finds that particles always 
end up in D2. Letting "e" informally denote the experimental arrangement, 
we would hke to calculate (e — > Di) and (e — > D2). Since Dj imphes both 
Pi V P2 and Qi V Q2, we have (e ^ D^) = (e ^ (Pi V P2) A {Qi V Qa) A ^j)- 
Using Pi A P2 — Qi A Q2 — 0, we mechanically apply axioms to produce 

2 

(e ^ i^,) = ^ (e ^ P„)(e A P„ ^ Q„)(e A P„ A ^ i^,). (25) 

n,m=l 

Since Pi is equivalent to a point in X, previous knowledge is irrelevant and 
we have (e A P^ — Qm) = {Pn Qm)- We also clearly want to assume that 
the particle can't hop the rails, in other words we assume that (P„ Qm) 
is zero unless n — m. This causes one of the sums to disappear giving 

2 

(e ^ Dj) = J](e ^ Pn){Pn ^ Qn){Qn ^ D,) (26) 
n=l 

This result is not surprising, but the point to focus on is that the result follows 
rigorously from the exotic probability axioms with natural assumptions given 
the marble-like picture of what is happening. 

To proceed further, we have to define what happens at the mirrors and 
the beam splitters. Naturally, in either this case or in standard quantum 
theory, what one means by "a mirror" and "a beam splitter" has to be put 
in by hand. In the ideal case, what one means by a "mirror" is that complex 
probabilities of particle bouncing off of it pick up a factor of i. A good 
experimentalist would naturally test this assumption in other measurements. 
Similarly, the beam splitters multiply probabilities by a factor of i when there 
is a "bounce." Thus, (e ^ P2) = i * (e ^ Pi), {Qi ^ D2) = i * {Qi P'l), 
{Q2 ^ Di) ^i* {Q2 D2), and (Pi ^ Qi) = (P2 ^ Q2) and so (e ^ 
Di) = as expected. 

Suppose now that the interferometer is such that a device could be at- 
tached to Ml such that it registered "hit" or "nohit" depending on whether 
the particle struck Mi or not. Experimentally the results are different and 
about half the particles go into Di. In quantum theory, one says that this 
is due to the "which path" principle. The two paths ending in Di no longer 
interfere because "you can tell which path was taken." You can see that this 
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result also follows mechanically with exotic probabilities. In the described 
situation, is evidently not a sufficient state space and one should use at 
least X {hit,nohit}. In this case, one can explicitly calculate that the 
interference is lost because two paths ending in Di no longer end at the same 
point in the state space. One can also calculate that if the device detecting 
whether Mi is hit works so poorly that {hit, nohit} are independent of Qi 
and Q2, then the interference effect is entirely restored^]. 

Note the difference with standard quantum theory. Quantum mechanics 
has no problem with this interferometer in the sense that the wave equation 
can be solved for any desired input wave packet. Of course, no one wants to 
do this, especially to get such simple results. This explains the popularity 
of the "which path" principle even though it is not completely clear what 
it means or how it follows from the fundamental wave equation. This is 
analogous to doing probability theory knowing the diffusion equation but not 
knowing Kolmogorov's axioms. In exotic probabilities, on the other hand, 
both a rigorous version of the "which path" principle and any wave equation 
are consequences of the underlying exotic probability theory. 

7 Exponential Decay 

The interferometer from the previous section suggests that exotics may be 
particularly helpful in situations where one wants predictions which are in- 
dependent of details of initial wave functions and potentials. "Exponential 
decay" provides simple examples of such situations and also brings up one of 
the lesser known mysteries of quantum theory. Consider a system such as a 
Co^° nucleus or a muon which may decay irreversibly. Given such a system, 
if the probability for a decay within a time interval t only depends on t and 
not on the history of the system, then a familiar argument in probability the- 
ory implies that the probability density for decay is exponential. Quantum 
mechanics, however, does not generally predict this[^ and so it would seem 
that for such non-exponential systems, the assumption that they decay in- 
dependent of their history is not correct. As with other paradoxes^, we can 
resolve this by realizing that the physical assumptions are correct; the prob- 
lem is caused by probability theory itself. Applying the physical assumptions 
to exotic probability theory instead, we suppose that in a P-probability with 
state space X, {At ^ Bf) = (^t+r — ^ -Bt'+r) for all t, t', r G R. Suppose also 
that X contains a subset a whose complement /3 is a "trap" in the sense that 
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implies f3t' for any t < t'. This means that at' imphes at for any t < t' 
also. With arguments similar to those in section 4, we find (oq at) = e^*, 
{/3o —>■ Pt) = 1, (ao — > A) = a (1 — e^^), and (/3o — > a*) = for some 
X E P and a G R. Although the exotic probabilities are simple exponentials, 
this isn't preserved in the predicted frequencies. The ordinary probability to 
remain free for time t is 

Prob(«i|«o) = 1^ rrr^n u (27) 

Ja II "0 ^ xt II +Jp II ao ^ Xt II 

and, using || ao Xt \\ = \\ ao at \\ \\ at xt \\ and || ao 
Xt 11 = 11 ao Pt II II aoAPt-* Xt ||, we have 

= 1 + kit) \\e-^^ - 1 II ^^^^ 



where 



2 II ao A /?t ^ 



A;(t) = a' — . (29) 

Ja II ^ II 

For small t and assuming that A is real an negative, Prob(at|ao) will decrease 
more slowly than 1 — 2Xt. If we also know that ao and Xt G j3t can be taken 
to be independent for sufficiently large t, then we say that the system is 
"forgetful." In this case, k(t) is asymptotically constant and Prob(at|ao) 
will be exponential for large times. Such deviations from exponential decay 
have only recently been observed experimentally pO[] . 



The examples of the last two sections show the usefulness of applying 
exotic probability theory directly as opposed to solving a PDE. This sort of 
reasoning is mostly missing in standard quantum theory. 



8 Bell's inequalities 

Bell's well known analysis of the spin version of the Einstein-Podolsky-Rosen 
experiment 1^ is almost universally summarized as showing that local real- 
istic theories are incompatible with the predictions of quantum mechanics 
and are therefore wrong. One might then expect that exotic probabilities 
would be ruled out by Bell because they are "realistic" in the state space 
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sense. Bell's analysis, however, does not follow once we modify probability 
theory. To see the problem, you only have to notice that the first step in 
Bell's analysis assumes that P{Mt'\e) = P{Mt' A At|e) and 

P{Mt>AAt\e)= [ P{Mt.AXt\e)= [ P(At|e)P(Mt,|e A A^) (30) 
JxeA JxeA 

for initial setup e, final measurement Mf and assuming that the final results 
are determined by some "hidden variable" A G A at some time t during the 
flight from decay to detectors. As pointed out in section 3, equation 33 fails 
to hold in general due to "interference terms" |p. In fact. Bell has shown 
exactly that if one wants local realism one must modify probability theory. 
Ironically, the standard summary of his results gives the opposite impression. 

Over the years, there have been more than twenty variations on Bell's 
result each with a different experimental arrangement and each concluding 
that local realistic theories are impossible. Bell's result and two of the more 
well known variations are considered in reference 3 in some detail and are 
shown not to eliminate exotic probabilities. There has also been an increasing 
tendency to refer to Bell and similar results as "non-local" effects because 
they cannot be explained by local correlations Q]. The point is, however, that 
if one has the wrong probability theory, one may also have the wrong notion 
of what is just a correlation. Within exotic probability theory, we expect that 
Bell's results are just correlations in the new probability theory. It's helpful 
to think of a classical experiment where one cuts a penny into a heads half 
and a tails half and mails one half penny to house A and the other half to 
house B. The results at the two houses are correlated, but nothing travels 
between them to insure the proper results. One therefore expects that there 
is nothing that one can do at house A to affect the fact that, at house B, 
one will find heads 50% of the time and tails 50% of the time. The same 
holds true in the EPR experiment. The results at one end of the experiment 
are 50% spin up and 50% spin down independent of the magnet orientation 
nothing that happens on the other side can affect this. 
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9 Time evolution 



Given some initial knowledge such as At with A C X, the exotic probability 
to arrive at some B (Z X at some later time t" is given by 

{At ^ Bt,>) = [ {At^ xt>){xt> ^ Bt>,) (31) 
Jxex 

for any time t' with t < t' < t" . This is called the Chapman-Kolmogorov 
equation in the probability literature. In the complex case with state space 
R*^, one can either follow reference 4 or Risken|^T| to conclude that for small 
r G R and small z G X, {xt ^ {x + z)t+T) is given by 

7;--^7i^-7=^exp(-r[i(^ - ^^>Ju\'i - + ^o]) (32) 
(27rr)'^/2ydet(z/) 2 r ■> t 

where i^o, Vj and z/j^ are moments of the time derivative of u;(a;, r) = {xt — *■ 
(x + 2;)f+T-) defined by complex functions vjyx) = J^Ur{x, z,0), i'j{x) = 
u!r{x, z, 0)zj, i^jk{x) = u!r{x, z, 0)zjZk- This is a central-limit-theorem- 
like phenomena where the details of the unknown function {xt — > (a; + z)t+T) 
are smoothed over and only a dependence on it's lowest moments survives. 
Identifying zj/r as the velocity, equation 35 is equivalent, for example, to 
the Schrodinger equation in R^ identifying z/q = —ieAo, Vj = ^Aj and 
^jk = {i/^)^jk- Similarly, quaternion probabilities in result in the Dirac 
equation^, 0. These arguments need to be made into proofs, but there is 
also a mystery as to why only parts of the available moments seem to be 
used by nature. Why, for instance, must Vj be purely real in R'^? 



10 Comparison with quantum theory 

In standard quantum theory, the state of the system is a ray in a Hilbert 
space. To define such a theory one must define a Hilbert space and a complete 
set of mutually commuting self-adjoint operators to serve as observables. In 
addition, one chooses a Hamiltonian and labels the states in the Hilbert 
space by irreducible representations of the Hamiltonian's symmetry group. 
For Hamiltonians invariant under the Lorentz group, states have spin and 
four-momenta. Time evolution is a one parameter semigroup given by the 
Hamiltonian operator. If "mixed states" occur, they must be described by 
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density matrices. Quite a bit of functional analysis must be understood to 
define this precisely. 

In an exotic probability theory, on the other hand, the state of the system 
is a point in a measure space X. To define the theory, one simply chooses 
X and picks R, C or H. Particles are not thought of as having momentum 
or spin, or any other internal structure. The only thing that a particle can 
do is to be somewhere. This is all that is required, however, because experi- 
ments which measure things like momentum and spin are always ultimately 
measuring position. Wave functions have the same status as densities do in 
Bayesian theory. People with different knowledge about a system will, in 
general, use different wave functions. Those who have more knowledge can 
expect better predictions. Situations requiring "mixed states" in quantum 
theory are described by the same exotic theory without modification and, 
similarly, there is no sensible concept of "being in a mixed state." Rather 
than choosing a Hamiltonian, one notes that wave functions are propagated 
in time by the unknown {xt x'^,). In typical state spaces this propagation 
obeys a PDE which depends only upon the lowest moments of [xt — > x[,) 
and these moments are identified with the vector potential and metric ten- 
sor. The relevant moments can either be measured experimentally with test 
particles or computed with some external theory like Maxwell's equations. 
One does not assume Lorentz or gauge invariance to get these results. 

11 Implications for the rest of physics and 
open questions 

Physical theories are thought to be quantum theories in only in a somewhat 
general sense. The successful predictions of quantum mechanics, must, of 
course, be reproduced, but this is not taken to mean that any theory must 
literally satisfy the axioms of quantum theory. There is, however, an inde- 
pendent reason why physical theories must be precisely exotic probability 
theories. The results of section two and three indicate that any theory which 
assigns likelihoods to pairs of propositions from a distributive lattice must 
exactly be an exotic probability theory or must violate one of our two Cox 
conditions or must fail to reduce to standard probability theory when pre- 
dicting frequencies. Physical theories are constrained by the results here just 
as alternatives to standard probability theory are constrained by Cox's origi- 
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nal arguments. The implications of this raise many questions about how this 
should be done for the rest of physics. 

In the case of field theory, Srinivasan has pioneered application of ex- 
otic probabilities to quantum field theory by calculating the Lamb shift in a 
quaternionic version of canonical quantization. His results agree with QED 
without any renormalization procedure. In addition to Srinivasan's approach 
it is clear in a very simple sense that electrons must emit photons because 
the vector potential remains unknown even when the electromagnetic field 
has been measured. Even in the case of a single electron, one must therefore 
sum over the various possible gauge equivalent vector potentials. One has no 
choice but to predict that an electron will have various possible motions and 
these will be correlated with various possible vector potentials. It is reason- 
able to expect that this simple effect should fit naturally in the framework 
of a complete field theory. This, however, has not been done. Also, similar 
considerations hold for the metric tensor and weighted sums over various 
possible metric tensors must similarly be finite. Does this then mean that 
one could calculate gravitational radiation? 

Exotic probability theories are much more restrictive than quantum me- 
chanics in the sense that the form of the vector potential and metric tensor 
is already determined by the choice of state space and probability. Since the 
choice of probability seems to be fixed by spin, one apparently only has the 
state space left to explain things like other gauge theories besides QED. Can 
Yang-Mills theories be formulated as exotic probability theories, and, if so, 
with what state space? 

Other questions arise if we sketch the general procedure for finding a PDE 
for wave functions. The basic theory here is formulated with a state space X 
only assumed to be a measure space. Assuming that X also has a topology, 
consider a point x in an open set O G X. One assumes that a time difference 
t' — t can be chosen such that {xt x[,) is negligible for x' outside of O. 
In addition, we suppose that O can be chosen such that {xt — > x[,) can be 
approximated by a function of only x' — x and t' — t. Given this, the path 
integral within O collapses to a convolution and this can be inverted with a 
Fourier transform resulting in a kernel depending only on the lowest moments 
of the time derivative of {xt — > x[t) as in section 9. Another way to think 
about this is to consider the ring of P-valued functions on O with pointwise 
addition and convolution as multiplication. In this case, we assume that 
these rings have units Kt in a "Dirac sequence" sense [^] limj^o Kt* f = f 
and limg t^o Ks * Kt = Kg^f This can be solved by considering a slight 
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generalization of tlie standard quadratic form: a function q : V ^ P where 
q{x + y) — q{x) — q{y) = b{x, y) for a symmetric bilinear h -.V ^ P . Then 
Kt'. X y-^ e'^*-^~"'*/*//f, a gV, It = J^e'^*^^-'/* provides solutions. As mentioned 
in section 9, this raises the question of why only certain of these Kt are seen in 
nature. There is also the question of what exactly must be assumed about X 
since, besides a topology, we only seem to need subtraction of nearby points 
in O. For instance, it is perhaps interesting to remove geometry entirely by 
allowing any multiplication on Hom(0, P) which forms a ring with pointwise 
addition and has a unit in the Dirac sequence sense. 

Although Srinivasan has worked in field theory directly, simple multi- 
particle systems have not been done with exotic probabilities. In particular, 
what is the relationship between spin and statistics for exotic probabilities? 
This seems likely to be interestingly different than in standard field theory. 

Although the time parameter in exotics seems essential once the state 
space axioms are introduced, this does not mean that exotics are nonrela- 
tivistic. "Time" in the complex theory, for example, can be interpreted 
as the proper time or path length parameter. One suspects however, that 
"time" is really the order in which one discovers facts about the system 
rather than anything more intrinsic. In this case, one might expect that 
automorphisms of the time parameter should result in equivalent theories 
with modified moments of {xt — > xj,). Is this correct and, if so, what are the 
consequences of invariance under time automorphisms? 

The fact that the vector potential appears as the first moment of the time 
derivative of {xt — > x't') suggests that Maxwell's equations should describe 
complex or quaternionic vector potentials. Are there complex and quater- 
nionic versions of Maxwell's equations and, if so, are it's classical predictions 
correct? 

The whole area of "Bayesian Inference" in ordinary probability theory is 
based on the idea that one can used Bayes theorem (which also follows in 
exotics) to systematically improve probabilities based on "prior" knowledge. 
It is clear that the same thing should be possible with exotic probabilities. 
In the standard Bayesian case, this is often based on the maximum entropy 
principle. The issue, then, is how to do Bayesian inference and is there an 
analogue of maximum entropy? 
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12 Summary 



Exotic probability theories as described here appear to be the only general- 
ization of probability theory consistent with the basic Bayesian framework. 
In addition to standard probability theory, we find that three exotic copies 
are possible where probabilities are real, complex or quaternion valued re- 
spectively. Although the exotic theories are substantially simpler that quan- 
tum mechanics both conceptually and mathematically, they nevertheless give 
the same predictions as standard quantum theory. These theories constrain 
physical theories in the same sense that Cox's original arguments constrain 
possible alternatives to standard probability theory. The implications of this 
beyond basic quantum theory are mostly unexplored, but we have attempted 
to at least formulate some fundamental open questions where new insights 
are needed. 
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